feat(quantization): Add GPTQ n-bit quantization support #21551

amitsrivastava78 · 2025-08-06T04:56:26Z

This commit integrates the GPTQ (Generative Pre-trained Transformer Quantization) algorithm into Keras.

Key features include:

A new GPTQConfig for configuring quantization parameters.
Integration with base Keras models via a model.quantize() method.
Support for custom dataset and tested models (GPT-2, OPT, Bloom, gemma3 etc).
Includes unit tests to verify perplexity and model functionality post-quantization.
Added the colab for the same here

gemini-code-assist

Summary of Changes

Hello @amitsrivastava78, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've implemented support for GPTQ (Generative Pre-trained Transformer Quantization) n-bit quantization within Keras. This feature allows for significantly reducing the memory footprint and improving the inference speed of large language models by quantizing their weights to lower precision, such as 4-bit. The integration provides a streamlined way to apply GPTQ to Keras models, enabling more efficient deployment on resource-constrained hardware while aiming to maintain model performance.

Highlights

New GPTQ Quantization Mode: I've introduced a new 'gptq' quantization mode, expanding the existing QUANTIZATION_MODES to support this advanced n-bit quantization technique.
Extended model.quantize() Method: The model.quantize() method has been updated to recognize and process the 'gptq' mode. This now requires a dedicated GPTQConfig object, which encapsulates all necessary parameters for the GPTQ algorithm, ensuring proper configuration and execution.
New Quantization Modules: I've added a new quantizers directory containing the core logic for GPTQ. This includes the GPTQ class for layer-specific operations, GPTQConfig for overall parameter management, gptqutils for data loading and layer-wise application, and quant for fundamental quantization functions.
Enhanced Testing for GPTQ: Comprehensive unit tests have been added and updated in model_test.py to validate the GPTQ implementation. These tests cover various scenarios, including different dataset types (in-memory, generator, and public datasets like WikiText2) and ensure the quantized models retain functionality.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces GPTQ n-bit quantization support to Keras, a significant feature. The implementation is well-structured, separating configuration, core logic, and utilities. The changes include a new GPTQConfig, integration into model.quantize(), the GPTQ algorithm implementation, and corresponding tests. My review has identified a few areas for improvement: removing leftover debug code, fixing an inconsistent error message, using HTTPS for downloads, addressing some dead code in tests, and refactoring for maintainability by reducing code duplication. Additionally, there's a performance consideration regarding tensor concatenation within a loop that could be optimized.

keras/src/models/model_test.py

keras/src/models/model.py

keras/src/quantizers/gptq.py

keras/src/quantizers/gptqutils.py

codecov-commenter · 2025-08-06T05:03:14Z

Codecov Report

❌ Patch coverage is 89.05325% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.79%. Comparing base (8c55abe) to head (37370e0).
⚠️ Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
keras/src/quantizers/gptq_quant.py	72.88%	8 Missing and 8 partials ⚠️
keras/src/quantizers/gptq.py	89.71%	6 Missing and 5 partials ⚠️
keras/src/quantizers/gptq_core.py	93.52%	4 Missing and 5 partials ⚠️
keras/api/_tf_keras/keras/quantizers/__init__.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #21551      +/-   ##
==========================================
+ Coverage   82.75%   82.79%   +0.04%     
==========================================
  Files         567      571       +4     
  Lines       56471    56807     +336     
  Branches     8818     8883      +65     
==========================================
+ Hits        46730    47033     +303     
- Misses       7580     7597      +17     
- Partials     2161     2177      +16

Flag	Coverage Δ
keras	`82.60% <89.05%> (+0.04%)`	⬆️
keras-jax	`63.93% <89.05%> (+0.14%)`	⬆️
keras-numpy	`58.03% <15.68%> (-0.26%)`	⬇️
keras-openvino	`34.54% <12.72%> (-0.14%)`	⬇️
keras-tensorflow	`64.37% <89.05%> (+0.15%)`	⬆️
keras-torch	`63.97% <89.05%> (+0.14%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JyotinderSingh

Thanks for this PR! I have left a few initial comments.

keras/src/models/model.py

keras/src/models/model_test.py

keras/src/quantizers/gptqutils.py

keras/src/quantizers/quant.py

keras/src/quantizers/gptq.py

keras/src/quantizers/gptqutils.py

JyotinderSingh · 2025-08-07T03:59:29Z

It would be helpful to attach colabs to the PR description showing improvements over raw 4-bit quantization for models that this feature has been tested with.

keras/src/quantizers/gptqconfig.py

keras/src/models/model.py

hertschuh

Thanks for the PR! There is a lot going on.

This is just a first pass, mostly high level / API comments.

keras/src/quantizers/gptqutils.py

keras/src/quantizers/gptq.py

keras/src/quantizers/gptqconfig.py

keras/src/quantizers/gptqutils.py

keras/src/quantizers/gptqconfig.py

keras/src/models/model.py

keras/src/quantizers/quant.py

keras/src/quantizers/gptq.py

amitsrivastava78 · 2025-08-07T16:19:24Z

It would be helpful to attach colabs to the PR description showing improvements over raw 4-bit quantization for models that this feature has been tested with.

Colab attached in the PR now

divyashreepathihalli

Thank you for this PR! left a few comments.
The code here is lacking test coverage. The codecov report also suggests the same - 62.40409% of the code is missing test coverage.

requirements-common.txt

keras/src/dtype_policies/dtype_policy.py

keras/src/models/model.py

keras/src/quantizers/gptqconfig.py

keras/src/quantizers/gptqutils.py

keras/src/models/model_test.py

keras/src/quantizers/gptq.py

hertschuh

One of the expectations is that after quantization, it should be possible to

run the model (using the quantized kernels)
save it (keeping it quantized)
reload it (keeping it quantized)
run it again (quantized).

And also

export it (which should trace it with the quantized kernels)

I don't see the hooks needed in Dense.quantized_call and EinsumDense.quantized_call and the variables to support that.

About the overall design:

GPTQConfig is the global config.
GPTQQuant is most importantly the "state" (quantized kernel), although it also has config and some logic to determine this state. Note that the logic to dequantize is separate.
GPTQ is the wrapper that connects together one layer with one GPTQQuant, so there are many GPTQ instances, and GPTQ is not where the core loop lives.

Let's find some names (or maybe even a different structure) that make it easier to follow this. I also find the splitting in 3 files (gptq.py, gptqquant.py, gptqutils.py`) a little hard to navigate.

keras/src/models/model.py

keras/src/quantizers/gptqquant.py

hertschuh · 2025-08-11T21:20:23Z

keras/src/quantizers/gptq_config.py

+        self.group_size = group_size
+        self.symmetric = symmetric
+        self.act_order = act_order
+        self.quantization_method = "gptq"


What is this for?

This is for the purpose of holding the information about group size, symmetric quantization is there or not and if act_order is needed, this will be used while doing the gptq quantization

Oh sorry, github makes this confusing because it always shows 4 lines of context.

I meant, what is self.quantization_method = "gptq" for?

It's never accessed and is implied by the fact that this is GPTQConfig.

keras/src/dtype_policies/dtype_policy.py

keras/src/models/model_test.py

keras/src/quantizers/gptqquant.py

keras/src/quantizers/gptqutils.py

hertschuh · 2025-08-11T22:11:55Z

keras/src/quantizers/gptqquant.py

+    return ops.multiply(scale, dequantized_x)
+
+
+class GPTQQuant:


What does Quant stand for? Quantizer? Quantized(Kernel)? Quantization?

"Quant" stands for Quantization

Can you call it GPTQQuantization? Let's limit abbreviation when there is ambiguity.

colab link shared with jyoptinder

This commit integrates the GPTQ (Generative Pre-trained Transformer Quantization) algorithm into Keras. Key features include: - A new `GPTQConfig` for configuring quantization parameters. - Integration with base Keras models via a `model.quantize()` method. - Support for multiple datasets (WIKITEXT2,PTB, C4,custom dataset) and tested models (GPT-2, OPT, Bloom,gemma3 etc). - Includes unit tests to verify perplexity and model functionality post-quantization.

hertschuh

@amitsrivastava78

I didn't see a response to:

One of the expectations is that after quantization, it should be possible to

run the model (using the quantized kernels)
save it (keeping it quantized)
reload it (keeping it quantized)
run it again (quantized).
And also
export it (which should trace it with the quantized kernels)

I don't see the hooks needed in Dense.quantized_call and EinsumDense.quantized_call and the variables to support that.

hertschuh · 2025-08-13T22:21:46Z

keras/src/models/model_test.py

+        else:
+            # Test for valid cases where no error should occur
+            try:
+                model.quantize(mode, config=config)


Per my PR level comment, can you add model.save, then reload and verify it's still quantized.

hertschuh · 2025-08-13T22:42:29Z

keras/src/quantizers/gptq_config.py

+from keras.src.quantizers.gptq_core import quantize_model
+
+
+@keras_export(["keras.GPTQConfig", "keras.quantizers.GPTQConfig"])


I don't think we should export as keras.GPTQConfig, very few things should be at the top level. Any reason to do that?

keras.quantizers.GPTQConfig alone works.

fchollet

Thanks for the PR!

fchollet · 2025-08-14T22:48:50Z

keras/src/quantizers/gptq.py

+                on their activation's second-order information.
+        """
+
+        W = ops.transpose(ops.cast(self.layer.kernel, "float32"))


Throughout the code, there's a lot of use of single-letter capital variables, which is against our code style. Variables should be lowercase, with hyphens if needed, and they should use reasonable descriptive names rather than single letters/

fchollet · 2025-08-14T22:50:22Z

keras/src/quantizers/gptq_config.py

+            analyze the model's activations.
+        tokenizer: A `keras_nlp.Tokenizer` instance (or a similar callable)
+            that is used to process the `dataset` if it contains strings.
+        wbits (int, optional): The number of bits to quantize weights to.


Arguments names don't follow code style. They should:

Use hyphens, e.g. num_samples

Use "num" for number prefix

Avoid abbreviations unless the word being abbreviated is extremely obvious.

For instance, percdamp should probably be hessian_damping

fchollet · 2025-08-14T22:50:45Z

keras/src/quantizers/gptq_config.py

+
+@keras_export(["keras.GPTQConfig", "keras.quantizers.GPTQConfig"])
+class GPTQConfig:
+    """Configuration class for the GPTQ algorithm.


This docstring should feature code examples.

fchollet · 2025-08-14T22:51:02Z

keras/src/quantizers/gptq_config.py

+class GPTQConfig:
+    """Configuration class for the GPTQ algorithm.
+
+    This class holds all the parameters needed to apply the GPTQ method


We should explain what the GPTQ method is and why/when a user should use it.

google-ml-butler bot added the size:XL label Aug 6, 2025

google-ml-butler bot assigned gbaned Aug 6, 2025

amitsrivastava78 requested a review from JyotinderSingh August 6, 2025 04:56

google-ml-butler bot added the awaiting review label Aug 6, 2025

github-actions bot added the Gemma Gemma model specific issues label Aug 6, 2025

gemini-code-assist bot reviewed Aug 6, 2025

View reviewed changes

JyotinderSingh suggested changes Aug 6, 2025

View reviewed changes

keras/src/quantizers/gptq.py Outdated Show resolved Hide resolved

amitsrivastava78 requested review from divyashreepathihalli, mattdangerw, hertschuh, fchollet and JyotinderSingh August 6, 2025 09:31

JyotinderSingh removed the request for review from mattdangerw August 7, 2025 03:18

JyotinderSingh suggested changes Aug 7, 2025

View reviewed changes

keras/src/quantizers/gptqconfig.py Outdated Show resolved Hide resolved

keras/src/models/model.py Show resolved Hide resolved

hertschuh reviewed Aug 7, 2025

View reviewed changes

divyashreepathihalli reviewed Aug 7, 2025

View reviewed changes

JyotinderSingh previously requested changes Aug 8, 2025

View reviewed changes

hertschuh reviewed Aug 11, 2025

View reviewed changes

amitsrivastava78 added 5 commits August 12, 2025 15:18

added dataset to be installed

558e5c0

Fix the AI comments except one

4b387ad

Fixed gptq algo for inline weights update

bd4919b

updated the review comments

0113829

amitsrivastava78 added 13 commits August 12, 2025 15:18

Renamed the quant to gptqquant class

872c75f

Renamed the quant file to gptqqnat

826f923

Reworked some superficial comments

dfb327c

Reworked on review comments

39b8326

Removed the huggingfce dependency

2c7173f

changed the file name to gptq_config.py

02b1df3

fix comments and added additional test file

7313b5f

added test to improve converage

463b252

removed numerics like +,-,* etc and used keras.ops

bf3de61

reworked on the review comments

0c370c7

updated the interface as per comments

044e4ec

reworked the comments

52b6346

fixed failing test case

7cf0c5d

amitsrivastava78 force-pushed the gptq branch from 89ef107 to 7cf0c5d Compare August 12, 2025 09:48

amitsrivastava78 added 3 commits August 12, 2025 17:06

Added test case to improve the coverage

80e5cf5

Added test case to improve the coverage

176eb63

Added test case to improve the coverage

37370e0

hertschuh reviewed Aug 13, 2025

View reviewed changes

fchollet reviewed Aug 14, 2025

View reviewed changes

		from keras.src.quantizers.gptq_core import quantize_model


		@keras_export(["keras.GPTQConfig", "keras.quantizers.GPTQConfig"])

feat(quantization): Add GPTQ n-bit quantization support #21551

Are you sure you want to change the base?

feat(quantization): Add GPTQ n-bit quantization support #21551

Uh oh!

Conversation

amitsrivastava78 commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JyotinderSingh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JyotinderSingh commented Aug 7, 2025

Uh oh!

Uh oh!

Uh oh!

hertschuh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amitsrivastava78 commented Aug 7, 2025

Uh oh!

divyashreepathihalli left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amitsrivastava78 commented Aug 6, 2025 •

edited

Loading

codecov-commenter commented Aug 6, 2025 •

edited

Loading

divyashreepathihalli left a comment •

edited

Loading

hertschuh Aug 13, 2025 •

edited

Loading